Overview

Dataset statistics

Number of variables41
Number of observations47520
Missing cells37330
Missing cells (%)1.9%
Total size in memory14.9 MiB
Average record size in memory328.0 B

Variable types

Numeric10
Text29
Boolean2

Alerts

recorded_by has constant value ""Constant
public_meeting is highly imbalanced (56.0%)Imbalance
funder has 2877 (6.1%) missing valuesMissing
installer has 2889 (6.1%) missing valuesMissing
public_meeting has 2689 (5.7%) missing valuesMissing
scheme_management has 3103 (6.5%) missing valuesMissing
scheme_name has 23036 (48.5%) missing valuesMissing
permit has 2439 (5.1%) missing valuesMissing
amount_tsh is highly skewed (γ1 = 57.2301714)Skewed
num_private is highly skewed (γ1 = 89.07841041)Skewed
id has unique valuesUnique
amount_tsh has 33331 (70.1%) zerosZeros
gps_height has 16275 (34.2%) zerosZeros
longitude has 1433 (3.0%) zerosZeros
num_private has 46903 (98.7%) zerosZeros
population has 17048 (35.9%) zerosZeros
construction_year has 16503 (34.7%) zerosZeros

Reproduction

Analysis started2024-02-09 10:28:59.066490
Analysis finished2024-02-09 10:29:03.471850
Duration4.41 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

id
Real number (ℝ)

UNIQUE 

Distinct47520
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37114.48641
Minimum0
Maximum74247
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:04.842212image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3733.9
Q118555.75
median37038
Q355666.25
95-th percentile70566.05
Maximum74247
Range74247
Interquartile range (IQR)37110.5

Descriptive statistics

Standard deviation21445.76541
Coefficient of variation (CV)0.5778273521
Kurtosis-1.199055428
Mean37114.48641
Median Absolute Deviation (MAD)18558.5
Skewness0.002774336093
Sum1763680394
Variance459920853.8
MonotonicityNot monotonic
2024-02-09T13:29:05.049260image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
454 1
 
< 0.1%
49218 1
 
< 0.1%
10673 1
 
< 0.1%
20940 1
 
< 0.1%
67861 1
 
< 0.1%
68334 1
 
< 0.1%
533 1
 
< 0.1%
30019 1
 
< 0.1%
66595 1
 
< 0.1%
17276 1
 
< 0.1%
Other values (47510) 47510
> 99.9%
ValueCountFrequency (%)
0 1
< 0.1%
1 1
< 0.1%
2 1
< 0.1%
4 1
< 0.1%
6 1
< 0.1%
ValueCountFrequency (%)
74247 1
< 0.1%
74243 1
< 0.1%
74242 1
< 0.1%
74240 1
< 0.1%
74239 1
< 0.1%

amount_tsh
Real number (ℝ)

SKEWED  ZEROS 

Distinct96
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean322.0475726
Minimum0
Maximum350000
Zeros33331
Zeros (%)70.1%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:05.250209image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q320
95-th percentile1200
Maximum350000
Range350000
Interquartile range (IQR)20

Descriptive statistics

Standard deviation3200.623244
Coefficient of variation (CV)9.938355435
Kurtosis4638.375637
Mean322.0475726
Median Absolute Deviation (MAD)0
Skewness57.2301714
Sum15303700.65
Variance10243989.15
MonotonicityNot monotonic
2024-02-09T13:29:05.441208image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 33331
70.1%
500 2488
 
5.2%
50 1986
 
4.2%
20 1185
 
2.5%
1000 1167
 
2.5%
200 980
 
2.1%
100 653
 
1.4%
10 649
 
1.4%
30 607
 
1.3%
2000 559
 
1.2%
Other values (86) 3915
 
8.2%
ValueCountFrequency (%)
0 33331
70.1%
0.2 2
 
< 0.1%
0.25 1
 
< 0.1%
1 2
 
< 0.1%
2 11
 
< 0.1%
ValueCountFrequency (%)
350000 1
< 0.1%
250000 1
< 0.1%
200000 1
< 0.1%
170000 1
< 0.1%
120000 1
< 0.1%
Distinct351
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:05.888211image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters475200
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique31 ?
Unique (%)0.1%

Sample

1st row2013-02-27
2nd row2011-03-17
3rd row2011-07-10
4th row2011-04-12
5th row2011-04-05
ValueCountFrequency (%)
2011-03-15 459
 
1.0%
2011-03-17 458
 
1.0%
2013-02-03 442
 
0.9%
2011-03-14 438
 
0.9%
2011-03-16 394
 
0.8%
2011-03-18 381
 
0.8%
2011-03-04 379
 
0.8%
2011-03-19 371
 
0.8%
2013-02-14 368
 
0.8%
2013-01-29 365
 
0.8%
Other values (341) 43465
91.5%
2024-02-09T13:29:06.499669image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 111199
23.4%
1 103202
21.7%
- 95040
20.0%
2 83096
17.5%
3 42279
 
8.9%
7 10258
 
2.2%
4 8602
 
1.8%
8 7477
 
1.6%
6 4895
 
1.0%
5 4861
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 380160
80.0%
Dash Punctuation 95040
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 111199
29.3%
1 103202
27.1%
2 83096
21.9%
3 42279
 
11.1%
7 10258
 
2.7%
4 8602
 
2.3%
8 7477
 
2.0%
6 4895
 
1.3%
5 4861
 
1.3%
9 4291
 
1.1%
Dash Punctuation
ValueCountFrequency (%)
- 95040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 475200
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 111199
23.4%
1 103202
21.7%
- 95040
20.0%
2 83096
17.5%
3 42279
 
8.9%
7 10258
 
2.2%
4 8602
 
1.8%
8 7477
 
1.6%
6 4895
 
1.0%
5 4861
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 475200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 111199
23.4%
1 103202
21.7%
- 95040
20.0%
2 83096
17.5%
3 42279
 
8.9%
7 10258
 
2.2%
4 8602
 
1.8%
8 7477
 
1.6%
6 4895
 
1.0%
5 4861
 
1.0%

funder
Text

MISSING 

Distinct1697
Distinct (%)3.8%
Missing2877
Missing (%)6.1%
Memory size371.4 KiB
2024-02-09T13:29:06.839668image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length30
Median length27
Mean length9.919629057
Min length1

Characters and Unicode

Total characters442842
Distinct characters69
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique865 ?
Unique (%)1.9%

Sample

1st rowDmdd
2nd rowCmsr
3rd rowKkkt
4th rowKi
5th rowHesawa
ValueCountFrequency (%)
of 7794
 
10.8%
government 7406
 
10.2%
tanzania 7320
 
10.1%
danida 2496
 
3.5%
world 2232
 
3.1%
water 2140
 
3.0%
hesawa 1795
 
2.5%
bank 1133
 
1.6%
rwssp 1107
 
1.5%
kkkt 1107
 
1.5%
Other values (1861) 37753
52.2%
2024-02-09T13:29:07.403673image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 54523
 
12.3%
n 46193
 
10.4%
i 30358
 
6.9%
e 30023
 
6.8%
27687
 
6.3%
r 22343
 
5.0%
t 18436
 
4.2%
o 18219
 
4.1%
s 13776
 
3.1%
d 12416
 
2.8%
Other values (59) 168868
38.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 340602
76.9%
Uppercase Letter 71734
 
16.2%
Space Separator 27687
 
6.3%
Other Punctuation 1075
 
0.2%
Decimal Number 651
 
0.1%
Open Punctuation 355
 
0.1%
Close Punctuation 350
 
0.1%
Dash Punctuation 265
 
0.1%
Connector Punctuation 123
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 54523
16.0%
n 46193
13.6%
i 30358
 
8.9%
e 30023
 
8.8%
r 22343
 
6.6%
t 18436
 
5.4%
o 18219
 
5.3%
s 13776
 
4.0%
d 12416
 
3.6%
f 12282
 
3.6%
Other values (16) 82033
24.1%
Uppercase Letter
ValueCountFrequency (%)
T 9696
13.5%
G 8558
11.9%
O 8483
11.8%
D 6329
 
8.8%
W 5908
 
8.2%
C 3735
 
5.2%
R 3541
 
4.9%
H 2802
 
3.9%
M 2488
 
3.5%
A 2363
 
3.3%
Other values (16) 17831
24.9%
Decimal Number
ValueCountFrequency (%)
0 643
98.8%
2 3
 
0.5%
9 2
 
0.3%
1 2
 
0.3%
4 1
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 639
59.4%
. 376
35.0%
\ 30
 
2.8%
& 22
 
2.0%
' 8
 
0.7%
Open Punctuation
ValueCountFrequency (%)
( 352
99.2%
[ 3
 
0.8%
Close Punctuation
ValueCountFrequency (%)
) 348
99.4%
] 2
 
0.6%
Space Separator
ValueCountFrequency (%)
27687
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 265
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 123
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 412336
93.1%
Common 30506
 
6.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 54523
 
13.2%
n 46193
 
11.2%
i 30358
 
7.4%
e 30023
 
7.3%
r 22343
 
5.4%
t 18436
 
4.5%
o 18219
 
4.4%
s 13776
 
3.3%
d 12416
 
3.0%
f 12282
 
3.0%
Other values (42) 153767
37.3%
Common
ValueCountFrequency (%)
27687
90.8%
0 643
 
2.1%
/ 639
 
2.1%
. 376
 
1.2%
( 352
 
1.2%
) 348
 
1.1%
- 265
 
0.9%
_ 123
 
0.4%
\ 30
 
0.1%
& 22
 
0.1%
Other values (7) 21
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 442842
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 54523
 
12.3%
n 46193
 
10.4%
i 30358
 
6.9%
e 30023
 
6.8%
27687
 
6.3%
r 22343
 
5.0%
t 18436
 
4.2%
o 18219
 
4.1%
s 13776
 
3.1%
d 12416
 
2.8%
Other values (59) 168868
38.1%

gps_height
Real number (ℝ)

ZEROS 

Distinct2401
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean668.7453704
Minimum-63
Maximum2770
Zeros16275
Zeros (%)34.2%
Negative1203
Negative (%)2.5%
Memory size371.4 KiB
2024-02-09T13:29:07.609668image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-63
5-th percentile0
Q10
median370
Q31320
95-th percentile1797
Maximum2770
Range2833
Interquartile range (IQR)1320

Descriptive statistics

Standard deviation692.9721534
Coefficient of variation (CV)1.036227216
Kurtosis-1.291294408
Mean668.7453704
Median Absolute Deviation (MAD)370
Skewness0.4621979983
Sum31778780
Variance480210.4054
MonotonicityNot monotonic
2024-02-09T13:29:07.800080image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 16275
34.2%
-15 52
 
0.1%
-16 49
 
0.1%
-13 45
 
0.1%
-20 43
 
0.1%
1290 42
 
0.1%
-14 41
 
0.1%
-27 39
 
0.1%
1269 39
 
0.1%
1304 39
 
0.1%
Other values (2391) 30856
64.9%
ValueCountFrequency (%)
-63 2
< 0.1%
-59 1
< 0.1%
-57 1
< 0.1%
-55 1
< 0.1%
-54 1
< 0.1%
ValueCountFrequency (%)
2770 1
< 0.1%
2628 1
< 0.1%
2627 1
< 0.1%
2626 2
< 0.1%
2614 1
< 0.1%

installer
Text

MISSING 

Distinct1923
Distinct (%)4.3%
Missing2889
Missing (%)6.1%
Memory size371.4 KiB
2024-02-09T13:29:08.086668image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length30
Median length29
Mean length6.103605118
Min length1

Characters and Unicode

Total characters272410
Distinct characters69
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique974 ?
Unique (%)2.2%

Sample

1st rowDMDD
2nd rowGove
3rd rowKKKT
4th rowKi
5th rowDWE
ValueCountFrequency (%)
dwe 14097
25.8%
government 2177
 
4.0%
water 1495
 
2.7%
hesawa 1153
 
2.1%
rwe 991
 
1.8%
district 964
 
1.8%
kkkt 922
 
1.7%
council 882
 
1.6%
commu 854
 
1.6%
danida 836
 
1.5%
Other values (1791) 30248
55.4%
2024-02-09T13:29:08.562672image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
D 22054
 
8.1%
W 20701
 
7.6%
E 20370
 
7.5%
a 13920
 
5.1%
n 13158
 
4.8%
e 12367
 
4.5%
i 11985
 
4.4%
A 10938
 
4.0%
r 10676
 
3.9%
t 10272
 
3.8%
Other values (59) 125969
46.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 134003
49.2%
Lowercase Letter 126376
46.4%
Space Separator 10099
 
3.7%
Other Punctuation 798
 
0.3%
Decimal Number 639
 
0.2%
Dash Punctuation 222
 
0.1%
Open Punctuation 131
 
< 0.1%
Connector Punctuation 125
 
< 0.1%
Close Punctuation 15
 
< 0.1%
Currency Symbol 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
D 22054
16.5%
W 20701
15.4%
E 20370
15.2%
A 10938
8.2%
C 8449
 
6.3%
S 5354
 
4.0%
R 5189
 
3.9%
I 4975
 
3.7%
T 4779
 
3.6%
K 4275
 
3.2%
Other values (16) 26919
20.1%
Lowercase Letter
ValueCountFrequency (%)
a 13920
11.0%
n 13158
10.4%
e 12367
9.8%
i 11985
9.5%
r 10676
 
8.4%
t 10272
 
8.1%
o 9915
 
7.8%
m 7447
 
5.9%
s 4958
 
3.9%
l 4952
 
3.9%
Other values (16) 26726
21.1%
Other Punctuation
ValueCountFrequency (%)
/ 556
69.7%
. 187
 
23.4%
& 44
 
5.5%
' 10
 
1.3%
# 1
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 636
99.5%
9 1
 
0.2%
4 1
 
0.2%
1 1
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 129
98.5%
[ 2
 
1.5%
Close Punctuation
ValueCountFrequency (%)
} 13
86.7%
] 2
 
13.3%
Space Separator
ValueCountFrequency (%)
10099
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 222
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 125
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 260379
95.6%
Common 12031
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
D 22054
 
8.5%
W 20701
 
8.0%
E 20370
 
7.8%
a 13920
 
5.3%
n 13158
 
5.1%
e 12367
 
4.7%
i 11985
 
4.6%
A 10938
 
4.2%
r 10676
 
4.1%
t 10272
 
3.9%
Other values (42) 113938
43.8%
Common
ValueCountFrequency (%)
10099
83.9%
0 636
 
5.3%
/ 556
 
4.6%
- 222
 
1.8%
. 187
 
1.6%
( 129
 
1.1%
_ 125
 
1.0%
& 44
 
0.4%
} 13
 
0.1%
' 10
 
0.1%
Other values (7) 10
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 272410
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
D 22054
 
8.1%
W 20701
 
7.6%
E 20370
 
7.5%
a 13920
 
5.1%
n 13158
 
4.8%
e 12367
 
4.5%
i 11985
 
4.4%
A 10938
 
4.0%
r 10676
 
3.9%
t 10272
 
3.8%
Other values (59) 125969
46.2%

longitude
Real number (ℝ)

ZEROS 

Distinct46043
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34.09131645
Minimum0
Maximum40.34519307
Zeros1433
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:08.768672image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile30.04355494
Q133.08431976
median34.91167698
Q337.18058514
95-th percentile39.13658192
Maximum40.34519307
Range40.34519307
Interquartile range (IQR)4.096265385

Descriptive statistics

Standard deviation6.538402533
Coefficient of variation (CV)0.1917908492
Kurtosis19.36254803
Mean34.09131645
Median Absolute Deviation (MAD)2.03768666
Skewness-4.203791707
Sum1620019.358
Variance42.75070768
MonotonicityNot monotonic
2024-02-09T13:29:08.969805image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 1433
 
3.0%
39.09568416 2
 
< 0.1%
39.09649867 2
 
< 0.1%
39.09348389 2
 
< 0.1%
39.10124424 2
 
< 0.1%
37.33981057 2
 
< 0.1%
37.53277831 2
 
< 0.1%
32.96700926 2
 
< 0.1%
39.09143391 2
 
< 0.1%
39.09906887 2
 
< 0.1%
Other values (46033) 46069
96.9%
ValueCountFrequency (%)
0 1433
3.0%
29.6071219 1
 
< 0.1%
29.61032056 1
 
< 0.1%
29.61096482 1
 
< 0.1%
29.61194674 1
 
< 0.1%
ValueCountFrequency (%)
40.34519307 1
< 0.1%
40.34430089 1
< 0.1%
40.32523996 1
< 0.1%
40.32522643 1
< 0.1%
40.32340181 1
< 0.1%

latitude
Real number (ℝ)

Distinct46044
Distinct (%)96.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-5.705002278
Minimum-11.64944018
Maximum-2 × 10-8
Zeros0
Zeros (%)0.0%
Negative47520
Negative (%)100.0%
Memory size371.4 KiB
2024-02-09T13:29:09.217196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum-11.64944018
5-th percentile-10.58403249
Q1-8.532465267
median-5.017697195
Q3-3.326464222
95-th percentile-1.417337671
Maximum-2 × 10-8
Range11.64944016
Interquartile range (IQR)5.206001045

Descriptive statistics

Standard deviation2.943502774
Coefficient of variation (CV)-0.5159512004
Kurtosis-1.057146654
Mean-5.705002278
Median Absolute Deviation (MAD)2.07045949
Skewness-0.1540169554
Sum-271101.7082
Variance8.664208578
MonotonicityNot monotonic
2024-02-09T13:29:09.421196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-2 × 10-81433
 
3.0%
-6.99261144 2
 
< 0.1%
-2.47667983 2
 
< 0.1%
-6.9802163 2
 
< 0.1%
-6.98945622 2
 
< 0.1%
-6.9813255 2
 
< 0.1%
-7.06537264 2
 
< 0.1%
-6.96247516 2
 
< 0.1%
-2.50658954 2
 
< 0.1%
-6.97826294 2
 
< 0.1%
Other values (46034) 46069
96.9%
ValueCountFrequency (%)
-11.64944018 1
< 0.1%
-11.64837759 1
< 0.1%
-11.58629656 1
< 0.1%
-11.56857679 1
< 0.1%
-11.56680457 1
< 0.1%
ValueCountFrequency (%)
-2 × 10-81433
3.0%
-0.99846435 1
 
< 0.1%
-0.998916 1
 
< 0.1%
-0.99901209 1
 
< 0.1%
-0.9994692 1
 
< 0.1%
Distinct30741
Distinct (%)64.7%
Missing1
Missing (%)< 0.1%
Memory size371.4 KiB
2024-02-09T13:29:09.732255image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length30
Median length25
Mean length10.9545445
Min length1

Characters and Unicode

Total characters520549
Distinct characters74
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique27331 ?
Unique (%)57.5%

Sample

1st rowNarmo
2nd rowLukali
3rd rowMahakama
4th rowShule Ya Msingi Chosi A
5th rowKwa Mjowe
ValueCountFrequency (%)
kwa 17072
 
19.5%
none 2858
 
3.3%
mzee 2699
 
3.1%
shuleni 1659
 
1.9%
ya 1196
 
1.4%
shule 1095
 
1.3%
school 880
 
1.0%
primary 827
 
0.9%
zahanati 778
 
0.9%
msingi 693
 
0.8%
Other values (24966) 57611
65.9%
2024-02-09T13:29:10.236566image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 78885
15.2%
i 41759
 
8.0%
39853
 
7.7%
n 33594
 
6.5%
e 32920
 
6.3%
w 25319
 
4.9%
K 25060
 
4.8%
o 24321
 
4.7%
u 19321
 
3.7%
M 17624
 
3.4%
Other values (64) 181893
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 394556
75.8%
Uppercase Letter 84048
 
16.1%
Space Separator 39853
 
7.7%
Decimal Number 1343
 
0.3%
Other Punctuation 581
 
0.1%
Dash Punctuation 87
 
< 0.1%
Open Punctuation 26
 
< 0.1%
Close Punctuation 26
 
< 0.1%
Connector Punctuation 16
 
< 0.1%
Modifier Symbol 13
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 78885
20.0%
i 41759
10.6%
n 33594
 
8.5%
e 32920
 
8.3%
w 25319
 
6.4%
o 24321
 
6.2%
u 19321
 
4.9%
l 16705
 
4.2%
m 14168
 
3.6%
h 13757
 
3.5%
Other values (16) 93807
23.8%
Uppercase Letter
ValueCountFrequency (%)
K 25060
29.8%
M 17624
21.0%
S 8558
 
10.2%
N 3923
 
4.7%
A 2789
 
3.3%
B 2723
 
3.2%
C 2256
 
2.7%
P 2039
 
2.4%
L 2014
 
2.4%
J 1889
 
2.2%
Other values (16) 15173
18.1%
Decimal Number
ValueCountFrequency (%)
1 410
30.5%
2 357
26.6%
3 120
 
8.9%
4 93
 
6.9%
7 78
 
5.8%
6 68
 
5.1%
5 67
 
5.0%
8 57
 
4.2%
9 55
 
4.1%
0 38
 
2.8%
Other Punctuation
ValueCountFrequency (%)
' 339
58.3%
. 134
 
23.1%
/ 106
 
18.2%
& 2
 
0.3%
Open Punctuation
ValueCountFrequency (%)
( 20
76.9%
[ 6
 
23.1%
Close Punctuation
ValueCountFrequency (%)
) 20
76.9%
] 6
 
23.1%
Space Separator
ValueCountFrequency (%)
39853
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 87
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 16
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 13
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 478604
91.9%
Common 41945
 
8.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 78885
16.5%
i 41759
 
8.7%
n 33594
 
7.0%
e 32920
 
6.9%
w 25319
 
5.3%
K 25060
 
5.2%
o 24321
 
5.1%
u 19321
 
4.0%
M 17624
 
3.7%
l 16705
 
3.5%
Other values (42) 163096
34.1%
Common
ValueCountFrequency (%)
39853
95.0%
1 410
 
1.0%
2 357
 
0.9%
' 339
 
0.8%
. 134
 
0.3%
3 120
 
0.3%
/ 106
 
0.3%
4 93
 
0.2%
- 87
 
0.2%
7 78
 
0.2%
Other values (12) 368
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 520549
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 78885
15.2%
i 41759
 
8.0%
39853
 
7.7%
n 33594
 
6.5%
e 32920
 
6.3%
w 25319
 
4.9%
K 25060
 
4.8%
o 24321
 
4.7%
u 19321
 
3.7%
M 17624
 
3.4%
Other values (64) 181893
34.9%

num_private
Real number (ℝ)

SKEWED  ZEROS 

Distinct59
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5045664983
Minimum0
Maximum1776
Zeros46903
Zeros (%)98.7%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:10.613196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1776
Range1776
Interquartile range (IQR)0

Descriptive statistics

Standard deviation13.25384979
Coefficient of variation (CV)26.2677959
Kurtosis10076.14263
Mean0.5045664983
Median Absolute Deviation (MAD)0
Skewness89.07841041
Sum23977
Variance175.6645344
MonotonicityNot monotonic
2024-02-09T13:29:10.816274image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 46903
98.7%
6 61
 
0.1%
1 56
 
0.1%
5 37
 
0.1%
8 37
 
0.1%
32 35
 
0.1%
15 31
 
0.1%
45 31
 
0.1%
39 27
 
0.1%
7 24
 
0.1%
Other values (49) 278
 
0.6%
ValueCountFrequency (%)
0 46903
98.7%
1 56
 
0.1%
2 19
 
< 0.1%
3 22
 
< 0.1%
4 15
 
< 0.1%
ValueCountFrequency (%)
1776 1
< 0.1%
1402 1
< 0.1%
755 1
< 0.1%
698 1
< 0.1%
672 1
< 0.1%

basin
Text

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:10.970193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length23
Median length11
Mean length10.89829545
Min length6

Characters and Unicode

Total characters517887
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowInternal
2nd rowInternal
3rd rowLake Rukwa
4th rowRufiji
5th rowWami / Ruvu
ValueCountFrequency (%)
lake 19374
22.2%
8404
9.6%
victoria 8205
9.4%
pangani 7143
 
8.2%
rufiji 6375
 
7.3%
internal 6224
 
7.1%
tanganyika 5169
 
5.9%
wami 4804
 
5.5%
ruvu 4804
 
5.5%
nyasa 4014
 
4.6%
Other values (4) 12786
14.6%
2024-02-09T13:29:11.294194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 85614
16.5%
i 46276
 
8.9%
n 40672
 
7.9%
39782
 
7.7%
e 29198
 
5.6%
u 28769
 
5.6%
k 26529
 
5.1%
t 21629
 
4.2%
L 19374
 
3.7%
r 18029
 
3.5%
Other values (22) 162015
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 390803
75.5%
Uppercase Letter 78898
 
15.2%
Space Separator 39782
 
7.7%
Other Punctuation 8404
 
1.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 85614
21.9%
i 46276
11.8%
n 40672
10.4%
e 29198
 
7.5%
u 28769
 
7.4%
k 26529
 
6.8%
t 21629
 
5.5%
r 18029
 
4.6%
o 15405
 
3.9%
g 12312
 
3.2%
Other values (10) 66370
17.0%
Uppercase Letter
ValueCountFrequency (%)
L 19374
24.6%
R 16765
21.2%
V 8205
10.4%
P 7143
 
9.1%
I 6224
 
7.9%
T 5169
 
6.6%
W 4804
 
6.1%
N 4014
 
5.1%
S 3600
 
4.6%
C 3600
 
4.6%
Space Separator
ValueCountFrequency (%)
39782
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 8404
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 469701
90.7%
Common 48186
 
9.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 85614
18.2%
i 46276
 
9.9%
n 40672
 
8.7%
e 29198
 
6.2%
u 28769
 
6.1%
k 26529
 
5.6%
t 21629
 
4.6%
L 19374
 
4.1%
r 18029
 
3.8%
R 16765
 
3.6%
Other values (20) 136846
29.1%
Common
ValueCountFrequency (%)
39782
82.6%
/ 8404
 
17.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 517887
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 85614
16.5%
i 46276
 
8.9%
n 40672
 
7.9%
39782
 
7.7%
e 29198
 
5.6%
u 28769
 
5.6%
k 26529
 
5.1%
t 21629
 
4.2%
L 19374
 
3.7%
r 18029
 
3.5%
Other values (22) 162015
31.3%
Distinct17232
Distinct (%)36.5%
Missing296
Missing (%)0.6%
Memory size371.4 KiB
2024-02-09T13:29:11.587194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length30
Median length26
Mean length7.899690835
Min length1

Characters and Unicode

Total characters373055
Distinct characters73
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9012 ?
Unique (%)19.1%

Sample

1st rowBashnet Kati
2nd rowLukali
3rd rowChawalikozi
4th rowShuleni
5th rowNgholong
ValueCountFrequency (%)
a 1917
 
3.4%
b 1636
 
2.9%
kati 1529
 
2.7%
wa 488
 
0.9%
shuleni 486
 
0.9%
majengo 481
 
0.8%
madukani 449
 
0.8%
mtaa 420
 
0.7%
juu 330
 
0.6%
mjini 309
 
0.5%
Other values (15351) 48626
85.8%
2024-02-09T13:29:12.080194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 57714
15.5%
i 36697
 
9.8%
n 26878
 
7.2%
u 21144
 
5.7%
e 20480
 
5.5%
o 18727
 
5.0%
M 16364
 
4.4%
g 15119
 
4.1%
l 13022
 
3.5%
m 12046
 
3.2%
Other values (63) 134864
36.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 305049
81.8%
Uppercase Letter 57076
 
15.3%
Space Separator 9448
 
2.5%
Other Punctuation 952
 
0.3%
Decimal Number 459
 
0.1%
Dash Punctuation 31
 
< 0.1%
Modifier Symbol 30
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 57714
18.9%
i 36697
12.0%
n 26878
 
8.8%
u 21144
 
6.9%
e 20480
 
6.7%
o 18727
 
6.1%
g 15119
 
5.0%
l 13022
 
4.3%
m 12046
 
3.9%
b 9466
 
3.1%
Other values (16) 73756
24.2%
Uppercase Letter
ValueCountFrequency (%)
M 16364
28.7%
K 10019
17.6%
N 4830
 
8.5%
B 4116
 
7.2%
I 3564
 
6.2%
S 3272
 
5.7%
A 2466
 
4.3%
C 2009
 
3.5%
L 1969
 
3.4%
U 1388
 
2.4%
Other values (15) 7079
12.4%
Decimal Number
ValueCountFrequency (%)
1 181
39.4%
2 56
 
12.2%
4 41
 
8.9%
3 37
 
8.1%
9 27
 
5.9%
6 26
 
5.7%
5 26
 
5.7%
8 24
 
5.2%
0 22
 
4.8%
7 19
 
4.1%
Other Punctuation
ValueCountFrequency (%)
' 826
86.8%
/ 101
 
10.6%
. 23
 
2.4%
# 2
 
0.2%
Open Punctuation
ValueCountFrequency (%)
( 3
75.0%
[ 1
 
25.0%
Close Punctuation
ValueCountFrequency (%)
) 3
75.0%
] 1
 
25.0%
Space Separator
ValueCountFrequency (%)
9448
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 31
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 30
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 362125
97.1%
Common 10930
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 57714
15.9%
i 36697
 
10.1%
n 26878
 
7.4%
u 21144
 
5.8%
e 20480
 
5.7%
o 18727
 
5.2%
M 16364
 
4.5%
g 15119
 
4.2%
l 13022
 
3.6%
m 12046
 
3.3%
Other values (41) 123934
34.2%
Common
ValueCountFrequency (%)
9448
86.4%
' 826
 
7.6%
1 181
 
1.7%
/ 101
 
0.9%
2 56
 
0.5%
4 41
 
0.4%
3 37
 
0.3%
- 31
 
0.3%
` 30
 
0.3%
9 27
 
0.2%
Other values (12) 152
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 373055
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 57714
15.5%
i 36697
 
9.8%
n 26878
 
7.2%
u 21144
 
5.7%
e 20480
 
5.5%
o 18727
 
5.0%
M 16364
 
4.4%
g 15119
 
4.1%
l 13022
 
3.5%
m 12046
 
3.2%
Other values (63) 134864
36.2%

region
Text

Distinct21
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:12.269214image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length13
Median length11
Mean length6.620896465
Min length4

Characters and Unicode

Total characters314625
Distinct characters32
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManyara
2nd rowDodoma
3rd rowMbeya
4th rowMbeya
5th rowMorogoro
ValueCountFrequency (%)
iringa 4254
 
8.7%
shinyanga 3977
 
8.1%
mbeya 3659
 
7.5%
kilimanjaro 3466
 
7.1%
morogoro 3223
 
6.6%
arusha 2692
 
5.5%
kagera 2662
 
5.5%
mwanza 2475
 
5.1%
kigoma 2255
 
4.6%
pwani 2115
 
4.3%
Other values (13) 18050
37.0%
2024-02-09T13:29:12.600199image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 66715
21.2%
n 26496
 
8.4%
r 25982
 
8.3%
i 25361
 
8.1%
o 23701
 
7.5%
g 20087
 
6.4%
M 13587
 
4.3%
m 10235
 
3.3%
y 8902
 
2.8%
K 8383
 
2.7%
Other values (22) 85176
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 265143
84.3%
Uppercase Letter 48174
 
15.3%
Space Separator 1308
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 66715
25.2%
n 26496
 
10.0%
r 25982
 
9.8%
i 25361
 
9.6%
o 23701
 
8.9%
g 20087
 
7.6%
m 10235
 
3.9%
y 8902
 
3.4%
u 8356
 
3.2%
w 7418
 
2.8%
Other values (11) 41890
15.8%
Uppercase Letter
ValueCountFrequency (%)
M 13587
28.2%
K 8383
17.4%
S 6295
13.1%
I 4254
 
8.8%
T 3630
 
7.5%
R 3559
 
7.4%
A 2692
 
5.6%
D 2409
 
5.0%
P 2115
 
4.4%
L 1250
 
2.6%
Space Separator
ValueCountFrequency (%)
1308
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 313317
99.6%
Common 1308
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 66715
21.3%
n 26496
 
8.5%
r 25982
 
8.3%
i 25361
 
8.1%
o 23701
 
7.6%
g 20087
 
6.4%
M 13587
 
4.3%
m 10235
 
3.3%
y 8902
 
2.8%
K 8383
 
2.7%
Other values (21) 83868
26.8%
Common
ValueCountFrequency (%)
1308
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 314625
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 66715
21.2%
n 26496
 
8.4%
r 25982
 
8.3%
i 25361
 
8.1%
o 23701
 
7.5%
g 20087
 
6.4%
M 13587
 
4.3%
m 10235
 
3.3%
y 8902
 
2.8%
K 8383
 
2.7%
Other values (22) 85176
27.1%

region_code
Real number (ℝ)

Distinct27
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.32651515
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:12.780195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median12
Q317
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)12

Descriptive statistics

Standard deviation17.61879844
Coefficient of variation (CV)1.149563241
Kurtosis10.20445601
Mean15.32651515
Median Absolute Deviation (MAD)6
Skewness3.162410202
Sum728316
Variance310.4220584
MonotonicityNot monotonic
2024-02-09T13:29:12.951194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
11 4259
 
9.0%
17 4000
 
8.4%
12 3659
 
7.7%
3 3466
 
7.3%
5 3249
 
6.8%
18 2669
 
5.6%
2 2430
 
5.1%
19 2429
 
5.1%
16 2255
 
4.7%
10 2105
 
4.4%
Other values (17) 16999
35.8%
ValueCountFrequency (%)
1 1755
3.7%
2 2430
5.1%
3 3466
7.3%
4 2026
4.3%
5 3249
6.8%
ValueCountFrequency (%)
99 343
 
0.7%
90 722
1.5%
80 1002
2.1%
60 839
1.8%
40 1
 
< 0.1%

district_code
Real number (ℝ)

Distinct20
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.639309764
Minimum0
Maximum80
Zeros19
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:13.105192image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q35
95-th percentile30
Maximum80
Range80
Interquartile range (IQR)3

Descriptive statistics

Standard deviation9.661284976
Coefficient of variation (CV)1.713203456
Kurtosis16.05677349
Mean5.639309764
Median Absolute Deviation (MAD)1
Skewness3.948457815
Sum267980
Variance93.34042739
MonotonicityNot monotonic
2024-02-09T13:29:13.311194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=20)
ValueCountFrequency (%)
1 9755
20.5%
2 8991
18.9%
3 7999
16.8%
4 7166
15.1%
5 3467
 
7.3%
6 3275
 
6.9%
7 2663
 
5.6%
8 818
 
1.7%
30 795
 
1.7%
33 687
 
1.4%
Other values (10) 1904
 
4.0%
ValueCountFrequency (%)
0 19
 
< 0.1%
1 9755
20.5%
2 8991
18.9%
3 7999
16.8%
4 7166
15.1%
ValueCountFrequency (%)
80 8
 
< 0.1%
67 3
 
< 0.1%
63 157
0.3%
62 87
0.2%
60 55
 
0.1%

lga
Text

Distinct125
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:13.590194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length16
Median length14
Mean length7.42081229
Min length3

Characters and Unicode

Total characters352637
Distinct characters41
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowBabati
2nd rowBahi
3rd rowMbozi
4th rowMbarali
5th rowKilosa
ValueCountFrequency (%)
rural 7642
 
13.5%
njombe 2001
 
3.5%
urban 1334
 
2.4%
moshi 1070
 
1.9%
arusha 1055
 
1.9%
bariadi 948
 
1.7%
singida 922
 
1.6%
kilosa 876
 
1.6%
rungwe 866
 
1.5%
mbozi 844
 
1.5%
Other values (106) 38938
68.9%
2024-02-09T13:29:14.141379image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 55893
15.9%
o 24121
 
6.8%
i 23581
 
6.7%
u 22727
 
6.4%
r 21574
 
6.1%
e 18148
 
5.1%
n 18019
 
5.1%
l 15384
 
4.4%
g 14704
 
4.2%
M 12826
 
3.6%
Other values (31) 125660
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 287165
81.4%
Uppercase Letter 56496
 
16.0%
Space Separator 8976
 
2.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 55893
19.5%
o 24121
 
8.4%
i 23581
 
8.2%
u 22727
 
7.9%
r 21574
 
7.5%
e 18148
 
6.3%
n 18019
 
6.3%
l 15384
 
5.4%
g 14704
 
5.1%
m 12504
 
4.4%
Other values (14) 60510
21.1%
Uppercase Letter
ValueCountFrequency (%)
M 12826
22.7%
R 9745
17.2%
K 9327
16.5%
S 5007
 
8.9%
N 4607
 
8.2%
B 3864
 
6.8%
U 2728
 
4.8%
I 1978
 
3.5%
L 1734
 
3.1%
T 1096
 
1.9%
Other values (6) 3584
 
6.3%
Space Separator
ValueCountFrequency (%)
8976
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 343661
97.5%
Common 8976
 
2.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 55893
16.3%
o 24121
 
7.0%
i 23581
 
6.9%
u 22727
 
6.6%
r 21574
 
6.3%
e 18148
 
5.3%
n 18019
 
5.2%
l 15384
 
4.5%
g 14704
 
4.3%
M 12826
 
3.7%
Other values (30) 116684
34.0%
Common
ValueCountFrequency (%)
8976
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 352637
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 55893
15.9%
o 24121
 
6.8%
i 23581
 
6.7%
u 22727
 
6.4%
r 21574
 
6.1%
e 18148
 
5.1%
n 18019
 
5.1%
l 15384
 
4.4%
g 14704
 
4.2%
M 12826
 
3.6%
Other values (31) 125660
35.6%

ward
Text

Distinct2076
Distinct (%)4.4%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:14.424194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length23
Median length19
Mean length7.49760101
Min length3

Characters and Unicode

Total characters356286
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44 ?
Unique (%)0.1%

Sample

1st rowBashinet
2nd rowLamaiti
3rd rowNdalambo
4th rowChimala
5th rowChakwale
ValueCountFrequency (%)
mashariki 464
 
0.9%
urban 430
 
0.8%
siha 346
 
0.7%
kusini 305
 
0.6%
magharibi 291
 
0.6%
igosi 242
 
0.5%
masama 241
 
0.5%
machame 221
 
0.4%
kati 218
 
0.4%
imalinyi 203
 
0.4%
Other values (2089) 48833
94.3%
2024-02-09T13:29:14.979202image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 55539
15.6%
i 32184
 
9.0%
n 23548
 
6.6%
u 21647
 
6.1%
o 20887
 
5.9%
e 18776
 
5.3%
g 16925
 
4.8%
M 15049
 
4.2%
m 12995
 
3.6%
l 12565
 
3.5%
Other values (44) 126171
35.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 299424
84.0%
Uppercase Letter 51590
 
14.5%
Space Separator 4303
 
1.2%
Other Punctuation 951
 
0.3%
Dash Punctuation 18
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 55539
18.5%
i 32184
10.7%
n 23548
 
7.9%
u 21647
 
7.2%
o 20887
 
7.0%
e 18776
 
6.3%
g 16925
 
5.7%
m 12995
 
4.3%
l 12565
 
4.2%
r 10479
 
3.5%
Other values (15) 73879
24.7%
Uppercase Letter
ValueCountFrequency (%)
M 15049
29.2%
K 8960
17.4%
I 4863
 
9.4%
N 4716
 
9.1%
S 2704
 
5.2%
L 2559
 
5.0%
B 2534
 
4.9%
U 2320
 
4.5%
C 1693
 
3.3%
R 1341
 
2.6%
Other values (15) 4851
 
9.4%
Other Punctuation
ValueCountFrequency (%)
' 830
87.3%
/ 121
 
12.7%
Space Separator
ValueCountFrequency (%)
4303
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 351014
98.5%
Common 5272
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 55539
15.8%
i 32184
 
9.2%
n 23548
 
6.7%
u 21647
 
6.2%
o 20887
 
6.0%
e 18776
 
5.3%
g 16925
 
4.8%
M 15049
 
4.3%
m 12995
 
3.7%
l 12565
 
3.6%
Other values (40) 120899
34.4%
Common
ValueCountFrequency (%)
4303
81.6%
' 830
 
15.7%
/ 121
 
2.3%
- 18
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 356286
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 55539
15.6%
i 32184
 
9.0%
n 23548
 
6.6%
u 21647
 
6.1%
o 20887
 
5.9%
e 18776
 
5.3%
g 16925
 
4.8%
M 15049
 
4.2%
m 12995
 
3.6%
l 12565
 
3.5%
Other values (44) 126171
35.4%

population
Real number (ℝ)

ZEROS 

Distinct971
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean179.5282828
Minimum0
Maximum30500
Zeros17048
Zeros (%)35.9%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:15.178225image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median25
Q3213
95-th percentile678
Maximum30500
Range30500
Interquartile range (IQR)213

Descriptive statistics

Standard deviation472.7729975
Coefficient of variation (CV)2.633417922
Kurtosis468.8981322
Mean179.5282828
Median Absolute Deviation (MAD)25
Skewness13.53762994
Sum8531184
Variance223514.3071
MonotonicityNot monotonic
2024-02-09T13:29:15.395194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 17048
35.9%
1 5655
 
11.9%
200 1553
 
3.3%
150 1512
 
3.2%
250 1364
 
2.9%
300 1173
 
2.5%
100 940
 
2.0%
50 924
 
1.9%
500 825
 
1.7%
350 778
 
1.6%
Other values (961) 15748
33.1%
ValueCountFrequency (%)
0 17048
35.9%
1 5655
 
11.9%
2 3
 
< 0.1%
3 4
 
< 0.1%
4 12
 
< 0.1%
ValueCountFrequency (%)
30500 1
< 0.1%
15300 1
< 0.1%
11463 1
< 0.1%
10000 1
< 0.1%
9865 1
< 0.1%

public_meeting
Boolean

IMBALANCE  MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2689
Missing (%)5.7%
Memory size371.4 KiB
True
40743 
False
4088 
(Missing)
 
2689
ValueCountFrequency (%)
True 40743
85.7%
False 4088
 
8.6%
(Missing) 2689
 
5.7%
2024-02-09T13:29:15.622195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

recorded_by
Text

CONSTANT 

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:15.763194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length23
Median length23
Mean length23
Min length23

Characters and Unicode

Total characters1092960
Distinct characters14
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGeoData Consultants Ltd
2nd rowGeoData Consultants Ltd
3rd rowGeoData Consultants Ltd
4th rowGeoData Consultants Ltd
5th rowGeoData Consultants Ltd
ValueCountFrequency (%)
geodata 47520
33.3%
consultants 47520
33.3%
ltd 47520
33.3%
2024-02-09T13:29:16.076199image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 190080
17.4%
a 142560
13.0%
o 95040
8.7%
95040
8.7%
n 95040
8.7%
s 95040
8.7%
G 47520
 
4.3%
e 47520
 
4.3%
D 47520
 
4.3%
C 47520
 
4.3%
Other values (4) 190080
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 807840
73.9%
Uppercase Letter 190080
 
17.4%
Space Separator 95040
 
8.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 190080
23.5%
a 142560
17.6%
o 95040
11.8%
n 95040
11.8%
s 95040
11.8%
e 47520
 
5.9%
u 47520
 
5.9%
l 47520
 
5.9%
d 47520
 
5.9%
Uppercase Letter
ValueCountFrequency (%)
G 47520
25.0%
D 47520
25.0%
C 47520
25.0%
L 47520
25.0%
Space Separator
ValueCountFrequency (%)
95040
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 997920
91.3%
Common 95040
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 190080
19.0%
a 142560
14.3%
o 95040
9.5%
n 95040
9.5%
s 95040
9.5%
G 47520
 
4.8%
e 47520
 
4.8%
D 47520
 
4.8%
C 47520
 
4.8%
u 47520
 
4.8%
Other values (3) 142560
14.3%
Common
ValueCountFrequency (%)
95040
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1092960
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 190080
17.4%
a 142560
13.0%
o 95040
8.7%
95040
8.7%
n 95040
8.7%
s 95040
8.7%
G 47520
 
4.3%
e 47520
 
4.3%
D 47520
 
4.3%
C 47520
 
4.3%
Other values (4) 190080
17.4%

scheme_management
Text

MISSING 

Distinct11
Distinct (%)< 0.1%
Missing3103
Missing (%)6.5%
Memory size371.4 KiB
2024-02-09T13:29:16.216304image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length16
Median length3
Mean length4.642073981
Min length3

Characters and Unicode

Total characters206187
Distinct characters28
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWater Board
2nd rowVWC
3rd rowVWC
4th rowVWC
5th rowVWC
ValueCountFrequency (%)
vwc 29462
59.0%
water 4697
 
9.4%
wug 4161
 
8.3%
authority 2522
 
5.0%
wua 2312
 
4.6%
board 2175
 
4.4%
parastatal 1346
 
2.7%
private 862
 
1.7%
operator 862
 
1.7%
company 820
 
1.6%
Other values (3) 757
 
1.5%
2024-02-09T13:29:16.543193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
W 40707
19.7%
C 30357
14.7%
V 29462
14.3%
a 17322
8.4%
t 14839
 
7.2%
r 14008
 
6.8%
o 7241
 
3.5%
e 7047
 
3.4%
U 6473
 
3.1%
5559
 
2.7%
Other values (18) 33172
16.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 118612
57.5%
Lowercase Letter 82016
39.8%
Space Separator 5559
 
2.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 17322
21.1%
t 14839
18.1%
r 14008
17.1%
o 7241
8.8%
e 7047
8.6%
i 3384
 
4.1%
y 3342
 
4.1%
h 3148
 
3.8%
u 2578
 
3.1%
d 2175
 
2.7%
Other values (6) 6932
8.5%
Uppercase Letter
ValueCountFrequency (%)
W 40707
34.3%
C 30357
25.6%
V 29462
24.8%
U 6473
 
5.5%
G 4161
 
3.5%
A 2312
 
1.9%
P 2208
 
1.9%
B 2175
 
1.8%
O 626
 
0.5%
S 75
 
0.1%
Space Separator
ValueCountFrequency (%)
5559
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 200628
97.3%
Common 5559
 
2.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
W 40707
20.3%
C 30357
15.1%
V 29462
14.7%
a 17322
8.6%
t 14839
 
7.4%
r 14008
 
7.0%
o 7241
 
3.6%
e 7047
 
3.5%
U 6473
 
3.2%
G 4161
 
2.1%
Other values (17) 29011
14.5%
Common
ValueCountFrequency (%)
5559
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 206187
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
W 40707
19.7%
C 30357
14.7%
V 29462
14.3%
a 17322
8.4%
t 14839
 
7.2%
r 14008
 
6.8%
o 7241
 
3.5%
e 7047
 
3.4%
U 6473
 
3.1%
5559
 
2.7%
Other values (18) 33172
16.1%

scheme_name
Text

MISSING 

Distinct2540
Distinct (%)10.4%
Missing23036
Missing (%)48.5%
Memory size371.4 KiB
2024-02-09T13:29:16.885194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length46
Median length37
Mean length14.50743343
Min length1

Characters and Unicode

Total characters355200
Distinct characters67
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique678 ?
Unique (%)2.8%

Sample

1st rowOlikimo water project
2nd rowS
3rd rowFufu
4th rowMalemeu gravity water supply
5th rowM
ValueCountFrequency (%)
water 7814
 
13.7%
supply 5394
 
9.4%
scheme 2019
 
3.5%
wa 1712
 
3.0%
gravity 1521
 
2.7%
maji 1076
 
1.9%
pipe 1070
 
1.9%
mradi 875
 
1.5%
line 815
 
1.4%
supplied 681
 
1.2%
Other values (2391) 34137
59.8%
2024-02-09T13:29:17.442206image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 38748
 
10.9%
33010
 
9.3%
e 27675
 
7.8%
i 21085
 
5.9%
p 17915
 
5.0%
r 17443
 
4.9%
t 15351
 
4.3%
u 14711
 
4.1%
l 13851
 
3.9%
n 13658
 
3.8%
Other values (57) 141753
39.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 280627
79.0%
Uppercase Letter 39741
 
11.2%
Space Separator 33010
 
9.3%
Other Punctuation 1040
 
0.3%
Dash Punctuation 430
 
0.1%
Open Punctuation 157
 
< 0.1%
Decimal Number 118
 
< 0.1%
Modifier Symbol 52
 
< 0.1%
Close Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 38748
13.8%
e 27675
 
9.9%
i 21085
 
7.5%
p 17915
 
6.4%
r 17443
 
6.2%
t 15351
 
5.5%
u 14711
 
5.2%
l 13851
 
4.9%
n 13658
 
4.9%
o 13493
 
4.8%
Other values (16) 86697
30.9%
Uppercase Letter
ValueCountFrequency (%)
M 7422
18.7%
K 4500
11.3%
N 3042
 
7.7%
S 3041
 
7.7%
A 2246
 
5.7%
I 2134
 
5.4%
W 2067
 
5.2%
B 1928
 
4.9%
L 1669
 
4.2%
U 1429
 
3.6%
Other values (15) 10263
25.8%
Decimal Number
ValueCountFrequency (%)
2 52
44.1%
3 41
34.7%
1 6
 
5.1%
4 6
 
5.1%
7 5
 
4.2%
5 4
 
3.4%
0 2
 
1.7%
6 2
 
1.7%
Other Punctuation
ValueCountFrequency (%)
' 743
71.4%
/ 290
 
27.9%
& 7
 
0.7%
Space Separator
ValueCountFrequency (%)
33010
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 430
100.0%
Open Punctuation
ValueCountFrequency (%)
( 157
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 52
100.0%
Close Punctuation
ValueCountFrequency (%)
) 25
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 320368
90.2%
Common 34832
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 38748
 
12.1%
e 27675
 
8.6%
i 21085
 
6.6%
p 17915
 
5.6%
r 17443
 
5.4%
t 15351
 
4.8%
u 14711
 
4.6%
l 13851
 
4.3%
n 13658
 
4.3%
o 13493
 
4.2%
Other values (41) 126438
39.5%
Common
ValueCountFrequency (%)
33010
94.8%
' 743
 
2.1%
- 430
 
1.2%
/ 290
 
0.8%
( 157
 
0.5%
` 52
 
0.1%
2 52
 
0.1%
3 41
 
0.1%
) 25
 
0.1%
& 7
 
< 0.1%
Other values (6) 25
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 355200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 38748
 
10.9%
33010
 
9.3%
e 27675
 
7.8%
i 21085
 
5.9%
p 17915
 
5.0%
r 17443
 
4.9%
t 15351
 
4.3%
u 14711
 
4.1%
l 13851
 
3.9%
n 13658
 
3.8%
Other values (57) 141753
39.9%

permit
Boolean

MISSING 

Distinct2
Distinct (%)< 0.1%
Missing2439
Missing (%)5.1%
Memory size371.4 KiB
True
31028 
False
14053 
(Missing)
 
2439
ValueCountFrequency (%)
True 31028
65.3%
False 14053
29.6%
(Missing) 2439
 
5.1%
2024-02-09T13:29:17.608211image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

construction_year
Real number (ℝ)

ZEROS 

Distinct55
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1303.353199
Minimum0
Maximum2013
Zeros16503
Zeros (%)34.7%
Negative0
Negative (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:17.770195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1986
Q32004
95-th percentile2010
Maximum2013
Range2013
Interquartile range (IQR)2004

Descriptive statistics

Standard deviation950.763878
Coefficient of variation (CV)0.7294752328
Kurtosis-1.588462948
Mean1303.353199
Median Absolute Deviation (MAD)22
Skewness-0.6411804174
Sum61935344
Variance903951.9517
MonotonicityNot monotonic
2024-02-09T13:29:17.988196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 16503
34.7%
2010 2133
 
4.5%
2008 2124
 
4.5%
2009 2027
 
4.3%
2000 1682
 
3.5%
2007 1275
 
2.7%
2006 1174
 
2.5%
2003 1035
 
2.2%
2011 1003
 
2.1%
2012 883
 
1.9%
Other values (45) 17681
37.2%
ValueCountFrequency (%)
0 16503
34.7%
1960 87
 
0.2%
1961 16
 
< 0.1%
1962 27
 
0.1%
1963 76
 
0.2%
ValueCountFrequency (%)
2013 134
 
0.3%
2012 883
1.9%
2011 1003
2.1%
2010 2133
4.5%
2009 2027
4.3%
Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:18.159193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length25
Median length17
Mean length7.729356061
Min length3

Characters and Unicode

Total characters367299
Distinct characters29
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowindia mark ii
3rd rowother
4th rowgravity
5th rowother
ValueCountFrequency (%)
gravity 21340
37.9%
nira/tanira 6566
 
11.7%
other 5776
 
10.3%
submersible 3851
 
6.8%
swn 3143
 
5.6%
80 2965
 
5.3%
mono 2284
 
4.1%
india 1991
 
3.5%
mark 1991
 
3.5%
ii 1920
 
3.4%
Other values (13) 4516
 
8.0%
2024-02-09T13:29:18.472200image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 48049
13.1%
r 47875
13.0%
a 46574
12.7%
t 33682
9.2%
v 22749
 
6.2%
y 21412
 
5.8%
g 21342
 
5.8%
n 20638
 
5.6%
e 15337
 
4.2%
s 11958
 
3.3%
Other values (19) 77683
21.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 344996
93.9%
Space Separator 8823
 
2.4%
Other Punctuation 6568
 
1.8%
Decimal Number 6286
 
1.7%
Dash Punctuation 626
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 48049
13.9%
r 47875
13.9%
a 46574
13.5%
t 33682
9.8%
v 22749
6.6%
y 21412
 
6.2%
g 21342
 
6.2%
n 20638
 
6.0%
e 15337
 
4.4%
s 11958
 
3.5%
Other values (13) 55380
16.1%
Decimal Number
ValueCountFrequency (%)
8 3143
50.0%
0 2965
47.2%
1 178
 
2.8%
Space Separator
ValueCountFrequency (%)
8823
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 6568
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 626
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 344996
93.9%
Common 22303
 
6.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 48049
13.9%
r 47875
13.9%
a 46574
13.5%
t 33682
9.8%
v 22749
6.6%
y 21412
 
6.2%
g 21342
 
6.2%
n 20638
 
6.0%
e 15337
 
4.4%
s 11958
 
3.5%
Other values (13) 55380
16.1%
Common
ValueCountFrequency (%)
8823
39.6%
/ 6568
29.4%
8 3143
 
14.1%
0 2965
 
13.3%
- 626
 
2.8%
1 178
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 367299
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 48049
13.1%
r 47875
13.0%
a 46574
12.7%
t 33682
9.2%
v 22749
 
6.2%
y 21412
 
5.8%
g 21342
 
5.8%
n 20638
 
5.6%
e 15337
 
4.2%
s 11958
 
3.3%
Other values (19) 77683
21.1%
Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:18.626205image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length15
Median length14
Mean length7.884617003
Min length4

Characters and Unicode

Total characters374677
Distinct characters26
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowindia mark ii
3rd rowother
4th rowgravity
5th rowother
ValueCountFrequency (%)
gravity 21340
38.6%
nira/tanira 6566
 
11.9%
other 5543
 
10.0%
submersible 4962
 
9.0%
swn 2965
 
5.4%
80 2965
 
5.4%
mono 2284
 
4.1%
mark 1991
 
3.6%
india 1991
 
3.6%
ii 1920
 
3.5%
Other values (7) 2709
 
4.9%
2024-02-09T13:29:18.947194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 48962
13.1%
r 48939
13.1%
a 46720
12.5%
t 33551
9.0%
v 22749
 
6.1%
g 21340
 
5.7%
y 21340
 
5.7%
n 20747
 
5.5%
e 17420
 
4.6%
s 12889
 
3.4%
Other values (16) 80020
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 354381
94.6%
Space Separator 7716
 
2.1%
Other Punctuation 6566
 
1.8%
Decimal Number 5930
 
1.6%
Dash Punctuation 84
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 48962
13.8%
r 48939
13.8%
a 46720
13.2%
t 33551
9.5%
v 22749
 
6.4%
g 21340
 
6.0%
y 21340
 
6.0%
n 20747
 
5.9%
e 17420
 
4.9%
s 12889
 
3.6%
Other values (11) 59724
16.9%
Decimal Number
ValueCountFrequency (%)
8 2965
50.0%
0 2965
50.0%
Space Separator
ValueCountFrequency (%)
7716
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 6566
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 84
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 354381
94.6%
Common 20296
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 48962
13.8%
r 48939
13.8%
a 46720
13.2%
t 33551
9.5%
v 22749
 
6.4%
g 21340
 
6.0%
y 21340
 
6.0%
n 20747
 
5.9%
e 17420
 
4.9%
s 12889
 
3.6%
Other values (11) 59724
16.9%
Common
ValueCountFrequency (%)
7716
38.0%
/ 6566
32.4%
8 2965
 
14.6%
0 2965
 
14.6%
- 84
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 374677
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 48962
13.1%
r 48939
13.1%
a 46720
12.5%
t 33551
9.0%
v 22749
 
6.1%
g 21340
 
5.7%
y 21340
 
5.7%
n 20747
 
5.5%
e 17420
 
4.6%
s 12889
 
3.4%
Other values (16) 80020
21.4%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:19.089194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length11
Mean length7.604250842
Min length5

Characters and Unicode

Total characters361354
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgravity
2nd rowhandpump
3rd rowother
4th rowgravity
5th rowother
ValueCountFrequency (%)
gravity 21340
44.6%
handpump 13222
27.6%
other 5150
 
10.8%
submersible 4962
 
10.4%
motorpump 2386
 
5.0%
rope 376
 
0.8%
pump 376
 
0.8%
wind-powered 84
 
0.2%
2024-02-09T13:29:19.420193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 34562
 
9.6%
r 34298
 
9.5%
p 32428
 
9.0%
t 28876
 
8.0%
i 26386
 
7.3%
m 23332
 
6.5%
g 21340
 
5.9%
y 21340
 
5.9%
v 21340
 
5.9%
u 20946
 
5.8%
Other values (11) 96506
26.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 360894
99.9%
Space Separator 376
 
0.1%
Dash Punctuation 84
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 34562
 
9.6%
r 34298
 
9.5%
p 32428
 
9.0%
t 28876
 
8.0%
i 26386
 
7.3%
m 23332
 
6.5%
g 21340
 
5.9%
y 21340
 
5.9%
v 21340
 
5.9%
u 20946
 
5.8%
Other values (9) 96046
26.6%
Space Separator
ValueCountFrequency (%)
376
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 84
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 360894
99.9%
Common 460
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 34562
 
9.6%
r 34298
 
9.5%
p 32428
 
9.0%
t 28876
 
8.0%
i 26386
 
7.3%
m 23332
 
6.5%
g 21340
 
5.9%
y 21340
 
5.9%
v 21340
 
5.9%
u 20946
 
5.8%
Other values (9) 96046
26.6%
Common
ValueCountFrequency (%)
376
81.7%
- 84
 
18.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 361354
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 34562
 
9.6%
r 34298
 
9.5%
p 32428
 
9.0%
t 28876
 
8.0%
i 26386
 
7.3%
m 23332
 
6.5%
g 21340
 
5.9%
y 21340
 
5.9%
v 21340
 
5.9%
u 20946
 
5.8%
Other values (11) 96506
26.7%
Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:19.554230image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length16
Median length3
Mean length4.341582492
Min length3

Characters and Unicode

Total characters206312
Distinct characters23
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowwater board
2nd rowvwc
3rd rowvwc
4th rowvwc
5th rowvwc
ValueCountFrequency (%)
vwc 32455
62.1%
wug 5204
 
10.0%
water 3042
 
5.8%
board 2326
 
4.4%
wua 2033
 
3.9%
private 1566
 
3.0%
operator 1566
 
3.0%
parastatal 1413
 
2.7%
other 764
 
1.5%
authority 716
 
1.4%
Other values (5) 1205
 
2.3%
2024-02-09T13:29:19.871575image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
w 43190
20.9%
v 34021
16.5%
c 33060
16.0%
a 17425
8.4%
r 13022
 
6.3%
t 11322
 
5.5%
u 8472
 
4.1%
o 8080
 
3.9%
e 6938
 
3.4%
g 5204
 
2.5%
Other values (13) 25578
12.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 201461
97.6%
Space Separator 4770
 
2.3%
Dash Punctuation 81
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w 43190
21.4%
v 34021
16.9%
c 33060
16.4%
a 17425
8.6%
r 13022
 
6.5%
t 11322
 
5.6%
u 8472
 
4.2%
o 8080
 
4.0%
e 6938
 
3.4%
g 5204
 
2.6%
Other values (11) 20727
10.3%
Space Separator
ValueCountFrequency (%)
4770
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 81
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 201461
97.6%
Common 4851
 
2.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 43190
21.4%
v 34021
16.9%
c 33060
16.4%
a 17425
8.6%
r 13022
 
6.5%
t 11322
 
5.6%
u 8472
 
4.2%
o 8080
 
4.0%
e 6938
 
3.4%
g 5204
 
2.6%
Other values (11) 20727
10.3%
Common
ValueCountFrequency (%)
4770
98.3%
- 81
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 206312
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
w 43190
20.9%
v 34021
16.5%
c 33060
16.0%
a 17425
8.4%
r 13022
 
6.3%
t 11322
 
5.5%
u 8472
 
4.1%
o 8080
 
3.9%
e 6938
 
3.4%
g 5204
 
2.5%
Other values (13) 25578
12.4%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:20.019200image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.890824916
Min length5

Characters and Unicode

Total characters470012
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuser-group
2nd rowuser-group
3rd rowuser-group
4th rowuser-group
5th rowuser-group
ValueCountFrequency (%)
user-group 42018
88.4%
commercial 2869
 
6.0%
parastatal 1413
 
3.0%
other 764
 
1.6%
unknown 456
 
1.0%
2024-02-09T13:29:20.315327image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 89082
19.0%
u 84492
18.0%
o 46107
9.8%
e 45651
9.7%
s 43431
9.2%
p 43431
9.2%
- 42018
8.9%
g 42018
8.9%
a 8521
 
1.8%
m 5738
 
1.2%
Other values (8) 19523
 
4.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 427994
91.1%
Dash Punctuation 42018
 
8.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 89082
20.8%
u 84492
19.7%
o 46107
10.8%
e 45651
10.7%
s 43431
10.1%
p 43431
10.1%
g 42018
9.8%
a 8521
 
2.0%
m 5738
 
1.3%
c 5738
 
1.3%
Other values (7) 13785
 
3.2%
Dash Punctuation
ValueCountFrequency (%)
- 42018
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 427994
91.1%
Common 42018
 
8.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 89082
20.8%
u 84492
19.7%
o 46107
10.8%
e 45651
10.7%
s 43431
10.1%
p 43431
10.1%
g 42018
9.8%
a 8521
 
2.0%
m 5738
 
1.3%
c 5738
 
1.3%
Other values (7) 13785
 
3.2%
Common
ValueCountFrequency (%)
- 42018
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 470012
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 89082
19.0%
u 84492
18.0%
o 46107
9.8%
e 45651
9.7%
s 43431
9.2%
p 43431
9.2%
- 42018
8.9%
g 42018
8.9%
a 8521
 
1.8%
m 5738
 
1.2%
Other values (8) 19523
 
4.2%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:20.454314image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length21
Median length14
Mean length10.66984428
Min length5

Characters and Unicode

Total characters507031
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowpay per bucket
2nd rownever pay
3rd rownever pay
4th rowpay monthly
5th rowpay when scheme fails
ValueCountFrequency (%)
pay 40155
39.7%
never 20318
20.1%
per 7223
 
7.1%
bucket 7223
 
7.1%
monthly 6574
 
6.5%
unknown 6521
 
6.4%
when 3154
 
3.1%
scheme 3154
 
3.1%
fails 3154
 
3.1%
annually 2886
 
2.9%
2024-02-09T13:29:20.780196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 65388
12.9%
n 55381
10.9%
53686
10.6%
y 49615
9.8%
a 49081
9.7%
p 47378
9.3%
r 28385
 
5.6%
v 20318
 
4.0%
u 16630
 
3.3%
l 15500
 
3.1%
Other values (11) 105669
20.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 453345
89.4%
Space Separator 53686
 
10.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 65388
14.4%
n 55381
12.2%
y 49615
10.9%
a 49081
10.8%
p 47378
10.5%
r 28385
 
6.3%
v 20318
 
4.5%
u 16630
 
3.7%
l 15500
 
3.4%
t 14641
 
3.2%
Other values (10) 91028
20.1%
Space Separator
ValueCountFrequency (%)
53686
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 453345
89.4%
Common 53686
 
10.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 65388
14.4%
n 55381
12.2%
y 49615
10.9%
a 49081
10.8%
p 47378
10.5%
r 28385
 
6.3%
v 20318
 
4.5%
u 16630
 
3.7%
l 15500
 
3.4%
t 14641
 
3.2%
Other values (10) 91028
20.1%
Common
ValueCountFrequency (%)
53686
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 507031
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 65388
12.9%
n 55381
10.9%
53686
10.6%
y 49615
9.8%
a 49081
9.7%
p 47378
9.3%
r 28385
 
5.6%
v 20318
 
4.0%
u 16630
 
3.3%
l 15500
 
3.1%
Other values (11) 105669
20.8%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:20.926203image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length10
Median length9
Mean length8.535458754
Min length5

Characters and Unicode

Total characters405605
Distinct characters20
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowper bucket
2nd rownever pay
3rd rownever pay
4th rowmonthly
5th rowon failure
ValueCountFrequency (%)
never 20318
26.0%
pay 20318
26.0%
per 7223
 
9.2%
bucket 7223
 
9.2%
monthly 6574
 
8.4%
unknown 6521
 
8.3%
on 3154
 
4.0%
failure 3154
 
4.0%
annually 2886
 
3.7%
other 844
 
1.1%
2024-02-09T13:29:21.238193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 59080
14.6%
n 55381
13.7%
r 31539
 
7.8%
30695
 
7.6%
y 29778
 
7.3%
a 29244
 
7.2%
p 27541
 
6.8%
v 20318
 
5.0%
u 19784
 
4.9%
o 17093
 
4.2%
Other values (10) 85152
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 374910
92.4%
Space Separator 30695
 
7.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 59080
15.8%
n 55381
14.8%
r 31539
8.4%
y 29778
7.9%
a 29244
7.8%
p 27541
 
7.3%
v 20318
 
5.4%
u 19784
 
5.3%
o 17093
 
4.6%
l 15500
 
4.1%
Other values (9) 69652
18.6%
Space Separator
ValueCountFrequency (%)
30695
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 374910
92.4%
Common 30695
 
7.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 59080
15.8%
n 55381
14.8%
r 31539
8.4%
y 29778
7.9%
a 29244
7.8%
p 27541
 
7.3%
v 20318
 
5.4%
u 19784
 
5.3%
o 17093
 
4.6%
l 15500
 
4.1%
Other values (9) 69652
18.6%
Common
ValueCountFrequency (%)
30695
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 405605
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 59080
14.6%
n 55381
13.7%
r 31539
 
7.8%
30695
 
7.6%
y 29778
 
7.3%
a 29244
 
7.2%
p 27541
 
6.8%
v 20318
 
5.0%
u 19784
 
4.9%
o 17093
 
4.2%
Other values (10) 85152
21.0%
Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:21.386193image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length18
Median length4
Mean length4.301746633
Min length4

Characters and Unicode

Total characters204419
Distinct characters19
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsoft
2nd rowsoft
3rd rowsoft
4th rowsoft
5th rowsalty
ValueCountFrequency (%)
soft 40633
85.0%
salty 4173
 
8.7%
unknown 1490
 
3.1%
milky 650
 
1.4%
coloured 395
 
0.8%
abandoned 275
 
0.6%
fluoride 179
 
0.4%
2024-02-09T13:29:21.720194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 44806
21.9%
t 44806
21.9%
o 43367
21.2%
f 40812
20.0%
l 5397
 
2.6%
n 5020
 
2.5%
y 4823
 
2.4%
a 4723
 
2.3%
k 2140
 
1.0%
u 2064
 
1.0%
Other values (9) 6461
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 204144
99.9%
Space Separator 275
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 44806
21.9%
t 44806
21.9%
o 43367
21.2%
f 40812
20.0%
l 5397
 
2.6%
n 5020
 
2.5%
y 4823
 
2.4%
a 4723
 
2.3%
k 2140
 
1.0%
u 2064
 
1.0%
Other values (8) 6186
 
3.0%
Space Separator
ValueCountFrequency (%)
275
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 204144
99.9%
Common 275
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 44806
21.9%
t 44806
21.9%
o 43367
21.2%
f 40812
20.0%
l 5397
 
2.6%
n 5020
 
2.5%
y 4823
 
2.4%
a 4723
 
2.3%
k 2140
 
1.0%
u 2064
 
1.0%
Other values (8) 6186
 
3.0%
Common
ValueCountFrequency (%)
275
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 204419
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 44806
21.9%
t 44806
21.9%
o 43367
21.2%
f 40812
20.0%
l 5397
 
2.6%
n 5020
 
2.5%
y 4823
 
2.4%
a 4723
 
2.3%
k 2140
 
1.0%
u 2064
 
1.0%
Other values (9) 6461
 
3.2%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:21.864195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length8
Median length4
Mean length4.235563973
Min length4

Characters and Unicode

Total characters201274
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgood
2nd rowgood
3rd rowgood
4th rowgood
5th rowsalty
ValueCountFrequency (%)
good 40633
85.5%
salty 4173
 
8.8%
unknown 1490
 
3.1%
milky 650
 
1.4%
colored 395
 
0.8%
fluoride 179
 
0.4%
2024-02-09T13:29:22.281196image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 83725
41.6%
d 41207
20.5%
g 40633
20.2%
l 5397
 
2.7%
y 4823
 
2.4%
n 4470
 
2.2%
t 4173
 
2.1%
a 4173
 
2.1%
s 4173
 
2.1%
k 2140
 
1.1%
Other values (8) 6360
 
3.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 201274
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 83725
41.6%
d 41207
20.5%
g 40633
20.2%
l 5397
 
2.7%
y 4823
 
2.4%
n 4470
 
2.2%
t 4173
 
2.1%
a 4173
 
2.1%
s 4173
 
2.1%
k 2140
 
1.1%
Other values (8) 6360
 
3.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 201274
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 83725
41.6%
d 41207
20.5%
g 40633
20.2%
l 5397
 
2.7%
y 4823
 
2.4%
n 4470
 
2.2%
t 4173
 
2.1%
a 4173
 
2.1%
s 4173
 
2.1%
k 2140
 
1.1%
Other values (8) 6360
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 201274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 83725
41.6%
d 41207
20.5%
g 40633
20.2%
l 5397
 
2.7%
y 4823
 
2.4%
n 4470
 
2.2%
t 4173
 
2.1%
a 4173
 
2.1%
s 4173
 
2.1%
k 2140
 
1.1%
Other values (8) 6360
 
3.2%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:22.488195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length6
Mean length7.360079966
Min length3

Characters and Unicode

Total characters349751
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowinsufficient
2nd rowenough
3rd rowenough
4th rowinsufficient
5th rowenough
ValueCountFrequency (%)
enough 26538
55.8%
insufficient 12104
25.5%
dry 5024
 
10.6%
seasonal 3225
 
6.8%
unknown 629
 
1.3%
2024-02-09T13:29:22.830195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 349751
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 349751
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 349751
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%
Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:22.989200image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length12
Median length6
Mean length7.360079966
Min length3

Characters and Unicode

Total characters349751
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowinsufficient
2nd rowenough
3rd rowenough
4th rowinsufficient
5th rowenough
ValueCountFrequency (%)
enough 26538
55.8%
insufficient 12104
25.5%
dry 5024
 
10.6%
seasonal 3225
 
6.8%
unknown 629
 
1.3%
2024-02-09T13:29:23.527199image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 349751
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 349751
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 349751
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 55858
16.0%
e 41867
12.0%
u 39271
11.2%
i 36312
10.4%
o 30392
8.7%
g 26538
7.6%
h 26538
7.6%
f 24208
6.9%
s 18554
 
5.3%
t 12104
 
3.5%
Other values (8) 38109
10.9%

source
Text

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:23.685195image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length20
Median length12
Mean length8.986637205
Min length3

Characters and Unicode

Total characters427045
Distinct characters21
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowshallow well
3rd rowshallow well
4th rowriver
5th rowshallow well
ValueCountFrequency (%)
shallow 13540
18.7%
well 13540
18.7%
spring 13537
18.7%
machine 8849
12.2%
dbh 8849
12.2%
river 7719
10.7%
rainwater 1829
 
2.5%
harvesting 1829
 
2.5%
hand 701
 
1.0%
dtw 701
 
1.0%
Other values (4) 1345
 
1.9%
2024-02-09T13:29:24.041198image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 54766
12.8%
r 34640
 
8.1%
e 34550
 
8.1%
h 33946
 
7.9%
i 33763
 
7.9%
a 29688
 
7.0%
w 29666
 
6.9%
s 28906
 
6.8%
n 26913
 
6.3%
24919
 
5.8%
Other values (11) 95288
22.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 402126
94.2%
Space Separator 24919
 
5.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 54766
13.6%
r 34640
8.6%
e 34550
8.6%
h 33946
8.4%
i 33763
8.4%
a 29688
 
7.4%
w 29666
 
7.4%
s 28906
 
7.2%
n 26913
 
6.7%
g 15366
 
3.8%
Other values (10) 79922
19.9%
Space Separator
ValueCountFrequency (%)
24919
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 402126
94.2%
Common 24919
 
5.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 54766
13.6%
r 34640
8.6%
e 34550
8.6%
h 33946
8.4%
i 33763
8.4%
a 29688
 
7.4%
w 29666
 
7.4%
s 28906
 
7.2%
n 26913
 
6.7%
g 15366
 
3.8%
Other values (10) 79922
19.9%
Common
ValueCountFrequency (%)
24919
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 427045
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 54766
12.8%
r 34640
 
8.1%
e 34550
 
8.1%
h 33946
 
7.9%
i 33763
 
7.9%
a 29688
 
7.0%
w 29666
 
6.9%
s 28906
 
6.8%
n 26913
 
6.3%
24919
 
5.8%
Other values (11) 95288
22.3%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:24.189202image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length20
Median length12
Mean length9.314330808
Min length3

Characters and Unicode

Total characters442617
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowspring
2nd rowshallow well
3rd rowshallow well
4th rowriver/lake
5th rowshallow well
ValueCountFrequency (%)
shallow 13540
21.5%
well 13540
21.5%
spring 13537
21.5%
borehole 9550
15.2%
river/lake 8325
13.2%
rainwater 1829
 
2.9%
harvesting 1829
 
2.9%
dam 505
 
0.8%
other 234
 
0.4%
2024-02-09T13:29:24.542369image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
l 72035
16.3%
e 53182
12.0%
r 45458
10.3%
o 32874
 
7.4%
w 28909
 
6.5%
s 28906
 
6.5%
a 27857
 
6.3%
i 25520
 
5.8%
h 25153
 
5.7%
n 17195
 
3.9%
Other values (10) 85528
19.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 418923
94.6%
Space Separator 15369
 
3.5%
Other Punctuation 8325
 
1.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 72035
17.2%
e 53182
12.7%
r 45458
10.9%
o 32874
7.8%
w 28909
6.9%
s 28906
6.9%
a 27857
 
6.6%
i 25520
 
6.1%
h 25153
 
6.0%
n 17195
 
4.1%
Other values (8) 61834
14.8%
Space Separator
ValueCountFrequency (%)
15369
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 8325
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 418923
94.6%
Common 23694
 
5.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 72035
17.2%
e 53182
12.7%
r 45458
10.9%
o 32874
7.8%
w 28909
6.9%
s 28906
6.9%
a 27857
 
6.6%
i 25520
 
6.1%
h 25153
 
6.0%
n 17195
 
4.1%
Other values (8) 61834
14.8%
Common
ValueCountFrequency (%)
15369
64.9%
/ 8325
35.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 442617
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
l 72035
16.3%
e 53182
12.0%
r 45458
10.3%
o 32874
 
7.4%
w 28909
 
6.5%
s 28906
 
6.5%
a 27857
 
6.3%
i 25520
 
5.8%
h 25153
 
5.7%
n 17195
 
3.9%
Other values (10) 85528
19.3%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:24.688197image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length11
Median length11
Mean length10.08308081
Min length7

Characters and Unicode

Total characters479148
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowgroundwater
2nd rowgroundwater
3rd rowgroundwater
4th rowsurface
5th rowgroundwater
ValueCountFrequency (%)
groundwater 36627
77.1%
surface 10659
 
22.4%
unknown 234
 
0.5%
2024-02-09T13:29:25.027197image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 83913
17.5%
u 47520
9.9%
a 47286
9.9%
e 47286
9.9%
n 37329
7.8%
o 36861
7.7%
w 36861
7.7%
g 36627
7.6%
d 36627
7.6%
t 36627
7.6%
Other values (4) 32211
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 479148
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 83913
17.5%
u 47520
9.9%
a 47286
9.9%
e 47286
9.9%
n 37329
7.8%
o 36861
7.7%
w 36861
7.7%
g 36627
7.6%
d 36627
7.6%
t 36627
7.6%
Other values (4) 32211
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 479148
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 83913
17.5%
u 47520
9.9%
a 47286
9.9%
e 47286
9.9%
n 37329
7.8%
o 36861
7.7%
w 36861
7.7%
g 36627
7.6%
d 36627
7.6%
t 36627
7.6%
Other values (4) 32211
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 479148
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 83913
17.5%
u 47520
9.9%
a 47286
9.9%
e 47286
9.9%
n 37329
7.8%
o 36861
7.7%
w 36861
7.7%
g 36627
7.6%
d 36627
7.6%
t 36627
7.6%
Other values (4) 32211
 
6.7%
Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:25.176199image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length27
Median length18
Mean length14.80359848
Min length3

Characters and Unicode

Total characters703467
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowhand pump
3rd rowother
4th rowcommunal standpipe
5th rowother
ValueCountFrequency (%)
communal 27615
29.1%
standpipe 27615
29.1%
hand 14073
14.8%
pump 14073
14.8%
other 5098
 
5.4%
multiple 4830
 
5.1%
improved 639
 
0.7%
spring 639
 
0.7%
cattle 91
 
0.1%
trough 91
 
0.1%
2024-02-09T13:29:25.492194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
p 89484
12.7%
m 74776
10.6%
n 69942
9.9%
a 69398
9.9%
47248
 
6.7%
u 46609
 
6.6%
d 42331
 
6.0%
e 38273
 
5.4%
t 37816
 
5.4%
l 37366
 
5.3%
Other values (8) 150224
21.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 656219
93.3%
Space Separator 47248
 
6.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p 89484
13.6%
m 74776
11.4%
n 69942
10.7%
a 69398
10.6%
u 46609
7.1%
d 42331
 
6.5%
e 38273
 
5.8%
t 37816
 
5.8%
l 37366
 
5.7%
i 33723
 
5.1%
Other values (7) 116501
17.8%
Space Separator
ValueCountFrequency (%)
47248
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 656219
93.3%
Common 47248
 
6.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
p 89484
13.6%
m 74776
11.4%
n 69942
10.7%
a 69398
10.6%
u 46609
7.1%
d 42331
 
6.5%
e 38273
 
5.8%
t 37816
 
5.8%
l 37366
 
5.7%
i 33723
 
5.1%
Other values (7) 116501
17.8%
Common
ValueCountFrequency (%)
47248
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 703467
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p 89484
12.7%
m 74776
10.6%
n 69942
9.9%
a 69398
9.9%
47248
 
6.7%
u 46609
 
6.6%
d 42331
 
6.0%
e 38273
 
5.4%
t 37816
 
5.4%
l 37366
 
5.3%
Other values (8) 150224
21.4%
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:25.647194image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length18
Median length18
Mean length13.88882576
Min length3

Characters and Unicode

Total characters659997
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowcommunal standpipe
2nd rowhand pump
3rd rowother
4th rowcommunal standpipe
5th rowother
ValueCountFrequency (%)
communal 27615
30.7%
standpipe 27615
30.7%
hand 14073
15.6%
pump 14073
15.6%
other 5098
 
5.7%
improved 639
 
0.7%
spring 639
 
0.7%
cattle 91
 
0.1%
trough 91
 
0.1%
dam 4
 
< 0.1%
2024-02-09T13:29:25.986211image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
p 84654
12.8%
m 69946
10.6%
n 69942
10.6%
a 69398
10.5%
42418
 
6.4%
d 42331
 
6.4%
u 41779
 
6.3%
e 33443
 
5.1%
o 33443
 
5.1%
t 32986
 
5.0%
Other values (8) 139657
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 617579
93.6%
Space Separator 42418
 
6.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p 84654
13.7%
m 69946
11.3%
n 69942
11.3%
a 69398
11.2%
d 42331
 
6.9%
u 41779
 
6.8%
e 33443
 
5.4%
o 33443
 
5.4%
t 32986
 
5.3%
i 28893
 
4.7%
Other values (7) 110764
17.9%
Space Separator
ValueCountFrequency (%)
42418
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 617579
93.6%
Common 42418
 
6.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
p 84654
13.7%
m 69946
11.3%
n 69942
11.3%
a 69398
11.2%
d 42331
 
6.9%
u 41779
 
6.8%
e 33443
 
5.4%
o 33443
 
5.4%
t 32986
 
5.3%
i 28893
 
4.7%
Other values (7) 110764
17.9%
Common
ValueCountFrequency (%)
42418
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 659997
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
p 84654
12.8%
m 69946
10.6%
n 69942
10.6%
a 69398
10.5%
42418
 
6.4%
d 42331
 
6.4%
u 41779
 
6.3%
e 33443
 
5.1%
o 33443
 
5.1%
t 32986
 
5.0%
Other values (8) 139657
21.2%
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size371.4 KiB
2024-02-09T13:29:26.123202image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Length

Max length23
Median length10
Mean length12.48455387
Min length10

Characters and Unicode

Total characters593266
Distinct characters15
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfunctional
2nd rowfunctional
3rd rownon functional
4th rownon functional
5th rownon functional
ValueCountFrequency (%)
functional 47520
65.4%
non 18252
 
25.1%
needs 3466
 
4.8%
repair 3466
 
4.8%
2024-02-09T13:29:26.478232image/svg+xmlMatplotlib v3.8.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 135010
22.8%
o 65772
11.1%
i 50986
 
8.6%
a 50986
 
8.6%
f 47520
 
8.0%
u 47520
 
8.0%
c 47520
 
8.0%
t 47520
 
8.0%
l 47520
 
8.0%
25184
 
4.2%
Other values (5) 27728
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 568082
95.8%
Space Separator 25184
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 135010
23.8%
o 65772
11.6%
i 50986
 
9.0%
a 50986
 
9.0%
f 47520
 
8.4%
u 47520
 
8.4%
c 47520
 
8.4%
t 47520
 
8.4%
l 47520
 
8.4%
e 10398
 
1.8%
Other values (4) 17330
 
3.1%
Space Separator
ValueCountFrequency (%)
25184
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 568082
95.8%
Common 25184
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 135010
23.8%
o 65772
11.6%
i 50986
 
9.0%
a 50986
 
9.0%
f 47520
 
8.4%
u 47520
 
8.4%
c 47520
 
8.4%
t 47520
 
8.4%
l 47520
 
8.4%
e 10398
 
1.8%
Other values (4) 17330
 
3.1%
Common
ValueCountFrequency (%)
25184
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 593266
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 135010
22.8%
o 65772
11.1%
i 50986
 
8.6%
a 50986
 
8.6%
f 47520
 
8.0%
u 47520
 
8.0%
c 47520
 
8.0%
t 47520
 
8.0%
l 47520
 
8.0%
25184
 
4.2%
Other values (5) 27728
 
4.7%